Support and Efficiency of Nested Parallelism in OpenMP Implementations
نویسنده
چکیده
Nested parallelism has been a major feature of OpenMP since its very beginnings. As a programming style, it provides an elegant solution for a wide class of parallel applications, with the potential to achieve substantial utilization of the available computational resources, in situations where outer-loop parallelism simply can not. Notwithstanding its significance, nested parallelism support was slow to find its way into OpenMP implementations, commercial and research ones alike. Even nowadays, the level of support is varying greatly among compilers and runtime systems. In this work, we take a closer look at OpenMP implementations with respect to their level of support for nested parallelism. We classify them into three broad categories: those that provide full support, those that provide partial support and those that provide no support at all. The systems surveyed include commercial and research ones. Additionally, we proceed to quantify the efficiency of the implementation. With a representative set of compilers that provide adequate support, we perform a comparative performance evaluation. We evaluate both the incurred overheads and their overall behavior, using microbenchmarks and a full-fledged application. The results are interesting because they show that full support of nested parallelism does not necessarily guarantee scalable performance. Among our findings is the fact that most compilers do not seem to handle nested parallelism in a predictable and stable way as the number of threads increases beyond the system’s processor count.
منابع مشابه
Nested Parallelism in the OMPi OpenMP/C Compiler
This paper presents a new version of the OMPi OpenMP C compiler, enhanced by lightweight runtime support based on user-level multithreading. A large number of threads can be spawned for a parallel region and multiple levels of parallelism are supported efficiently, without introducing additional overheads to the OpenMP library. Management of nested parallelism is based on an adaptive distributi...
متن کاملA Microbenchmark Study of OpenMP Overheads under Nested Parallelism
In this work we present a microbenchmark methodology for assessing the overheads associated with nested parallelism in OpenMP. Our techniques are based on extensions to the well known EPCC microbenchmark suite that allow measuring the overheads of OpenMP constructs when they are effected in inner levels of parallelism. The methodology is simple but powerful enough and has enabled us to gain int...
متن کاملPortable Support and Exploitation of Nested Parallelism in OpenMP
In this paper, we present an alternative implementation of the NANOS OpenMP runtime library (NthLib) that targets portability and efficient support of multiple levels of parallelism. We have implemented the runtime libraries of available opensource OpenMP compilers on top of NthLib, reducing thus their overheads and providing them with inherent support for nested parallelism. In addition, we pr...
متن کاملAn Hierarchical MPI Communication Model for the Parallelized Solution of Multiple Integrals
For the modeling of polymer meso-structures, the spinodal points can be obtained by random phase approximations. The necessary number of these spinodal points to describe a phase diagram can significantly be reduced, if the usual sampling-point method is replaced by a Newton iteration, utilizing all the transiently computed data. This has the consequence that the simple inner parallelism of the...
متن کاملTask-Based Execution of Nested OpenMP Loops
In this work we propose a novel technique to reduce the overheads related to nested parallel loops in OpenMP programs. In particular we show that in many cases it is possible to replace the code of a nested parallel-for loop with equivalent code that creates tasks instead of threads, thereby limiting parallelism levels while allowing more opportunities for runtime load balancing. In addition we...
متن کامل